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Abstract 



P-H This article 1 presents differential equations and solution methods for 

the functions of the form A(z) = F _1 (G(z)), where F and G are cumu- 
lative distribution functions. Such functions allow the direct recycling of 
samples from one distribution into samples from another. The method 
may be developed analytically for certain special cases, and illuminate 
the idea that it is a more precise form of the traditional Cornish-Fisher 
^.J expansion. In this manner the model risk of distributional risk may be 

£H assessed free of the Monte Carlo noise associated with resampling. The 

method may also be regarded as providing both analytical and numerical 
bases for doing more precise Cornish-Fisher transformations. Examples 
are given of equations for converting normal samples to Student t, and 
converting exponential to hyperbolic, variance gamma and normal. In the 
case of the normal distribution, the change of variables employed allows 
the sampling to take place to good accuracy based on a single rational 
QO approximation over a very wide range of the sample space. The avoidance 

of any branching statement is of use in optimal GPU computations, and 
\Q we give example of branch-free normal quantiles that offer performance 

improvements in a GPU environment, while retaining the precision char- 
j acteristics of well-known methods. 
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1 Introduction 

The construction of Monte Carlo samples from a distribution is facilitated if one 
has a knowledge of the quantilc function w(u) of a distribution. If F(x) is the 
cumulative distribution function, then the quantilc w(u) is the solution of the 
equation 

F(w{u))=u. (1) 

A knowledge of the function w(u) makes Monte Carlo simulation straightfor- 
ward: given a random sample U from the uniform distribution, a sample from 
the target distribution characterized by f(x),F(x) is 

X = w(U) . (2) 

While it is commonplace to use the uniform distribution on the unit interval as 
the base distribution for sampling, there is in fact no need to do so. For example, 
a great deal of intellectual effort has been expended on highly efficient sampling 
from the normal and other well-known distributions. Given such samples, can 
we leverage the work done to create samples from other distributions in an 
efficient manner? This article will address this question in the affirmative. In 
principle the answer is trivial: given a sample Z from a distribution with CDF 
G(x), we first work out G(Z) which is uniform. Then we can apply the quantilc 
function F^ 1 (x) associated with a target distribution with CDF F and form 
F~ 1 (G(z)) as a sample from that target distribution. In general F, G and their 
inverses can be rather awkward special functions (see e.g. [13]) , so a direct 
route to the object A(z) = F~ 1 (G(z)) would be helpful. 

There are at least two ways of developing this idea. One route is to postulate 
interesting forms for the composite mapping. This has been explored by Shaw 
and Buckley [15] based on Gilchrist's theory of quantile transformations [8]. 
In this way we can find skew and kurtotic variations of any base distribution, 
while avoiding, in a controlled manner, the introduction of "negative density" 
problems that arise in traditional Gram-Charlier methods. The second route is 
to try to simplify the mapping given a choice of F and G. Such a route can be 
found by the method of differential equations for quantilc functions developed 
by Steinbrecher and Shaw [16]. In the net section we will give a brief review of 
that approach. 

A particular application of our approach will be to present new methods 
of constructing the normal quantile by first filtering it through a two-sided 
exponential distribution. We will show that this offers a useful performance 
benefit in a GPU environment, where branching algorithms may be subject to 
significant performance penalties. Our change-of-variables approach will allow 
costly branching to be avoided and we will demonstrate the benefits in the 
CUDA environment for programming NVIDIA GPUs. 

2 Quantile mechanics 

If f(x) is the probability density function for a real random variable X, the first 
order quantile ODE is obtained by differentiating Eqn. (1), to obtain: 



(3) 
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where w(u) is the quantile function considered as a function of u, with < u < 1. 
Applying the product rule with a further differentiation we obtain: 

„, . ..d 2 w(u) „,. . ../ dw(u)\ 2 n ... 

fw»))^y + {-ir) =°- (4) 

This may be reorganized as 

d 2 w(u) f dw(u)\ 2 

where 

H(w) = -^-lo g {f(w)}. (6) 

and the simple rational form of H(w) for many common distributions, particu- 
larly the Pearson family, allows analytical series solutions to be developed [16]. 
This last equation we refer to as the second order quantile equation. 

2.1 The Recycling ODE 

Now suppose that we make a change of independent variable in the second order 
quantile equation Eqn (8). We let v = q(u), and regard w as a function of v. 
We write w(u) = Q(v), where v = q(u). Elementary application of the chain 
rule and some algebra gives us: 



dv 2 [q'(u)] 2 dv \ dv 

In general this is a rather awkward differential equation. However, when we 
regard q(u) as being itself a quantile function, we can make some simplifications. 
If q(u) is a quantile mapping, it satisfies an ODE of the form 

r^y . ( 8) 



du 2 \ du 

where 

H( w ) = -±-l 0g {f(w)} . (9) 

and / is the probability density function associated with the quantile q(u). So 
we can simplify the ODE to 

and bearing in mind that v = q(u) we arrive at the "Recycling Ordinary Differ- 
ential Equation" : 
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2.2 The Recycling ODE for a Gaussian background 

In this case we have the following obvious sequence of manipulations: 

/>) = 4^^ x2/2 (12) 

log/(aO = -l/21og(27r) - x 2 /2 (13) 

^log/>) = -z (14) 

= w (15) 
and we arrive at the Recycling ODE for a Gaussian background as 

d 2 Q(v) dQ(v) _ (dQ{vY " 



This is an interesting example to consider for target distributions along the 
entire real line. 

2.3 The Recycling ODE for a one-sided exponential back- 
ground 

In this case we have the following obvious sequence of manipulations: 

f(x) = e- x , logf(x) = -x, ^log/Or) = -1, H(v) = l (17) 
and we arrive at the Recycling ODE for a exponential background as 

^) + ^ =H{Q{v)) (dom\ (18) 



dv 2 dv \ dv 

This is an interesting example to consider for target distributions along the 
positive real line. For distributions that are asymptotically exponential in both 
directions it can be used in two pieces. 

3 Example with a Gaussian background 

In a Gaussian background we work with the Recycling ODE in the form 

Q" + vQ' = H{Q){Q') 2 (19) 

where the explicit dependence on v is suppressed for brevity, and ' denotes d/dv. 
The target distribution is encoded through the function H. Note that it is not 
required in any sense that the target distribution is "close" to, or asymptotic 
to a Gaussian. This is an exact relationship governing the function Q that is 
the composition of the Gaussian CDF followed by the ordinary quantile of the 
target distribution. But such a relationship must contain all information rele- 
vant to the creation of an expansion of one distribution in terms of another. In 
particular, we should be able to re-create known and new expansions of Cornish- 
Fisher type. Generalized Cornish-Fisher expansions have been considered in the 
notable paper by Hill and Davis [9], but the step to considering the matter as 
the solution of a single differential equation is, so far as this author is aware, a 
new one. 
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3.1 The Student distribution 

This is an interesting case for several reasons: 

1. We can illustrate the method; 

2. We can recover a well known asymptotic series; 

3. We can develop that series to arbitrary numbers of terms; 

4. We can explore the limitations of the known series; 

5. We can develop an alternative numerical method and explore purely nu- 
merical options. 

The f/-function for the Student case can be written down as 

*" (0 >=H)ttw; (20) 

and the Recycling ODE can be written in the form 

1 + ?) (°" + VQ ') = { 1 + l) QiQ ' )2 (21) 

We note that if we let n — > oo we obtain 

Q" + vQ' = Q{Q') 2 (22) 

and this has the desired solution Q — v. More generally we can look at series 
solutions, but should be mindful of the fact that the term Q 2 /n is present - this 
is a hint that the behaviour of series for Q <C \fn and Q ^> y/n could be rather 
different. Such considerations do not always apply if one is thinking in a purely 
asymptotic framework. For any finite n, no matter how large, there will always 
be values of Q such that the behaviour is far from Gaussian. This was alluded 
to in [13], where it was noted that the known Cornish-Fisher expansion always 
goes wrong in the tails as some point. 

We also need to consider boundary conditions. The derivative of any ordi- 
nary quantile function at a point is the inverse of the PDF at the corresponding 
quantile. We first work around the point u = 1/2 which corresponds to v = 
in the Gaussian coordinate. If z(u) and t(u) are the ordinary quantiles then we 
have 

2(1/2) = 0, 
z'(l/2) = V2^ 

t(l/2) = 0, (23) 
r[n/2] 



^ = ^r[(n + l)/2] 

It follows that the centre conditions we wish to apply to the Recycling ODE are 
just: 

0(0) = o, 

T[n/2] (24) 



O ' (0) = ^V2r[(, + i)/2] 

where the latter expression 7 arises as the ratio of the derivatives. 
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3.2 The central expansion 

We now develop a series solution about the centre, and we expect that it will be 
reasonable to treat the solution as "close to Gaussian" if Q 2 <C n. We assume, 
as both the normal and Student quantiles are symmetric, that 

oo 

Q(v) ~5>« 2fe+1 (25) 

fc=0 

where cq — 7. We use the tilde notation to indicate that at this point we have 
no presumption as to whether the resulting series will be convergent for all v or 
form some kind of asymptotic series. We find that 

(n + l)-f 3 — n-f 

® n (26) 
_ (in 2 + 8n + 1) 7 5 + (-10n 2 - lOn) 7 s + 3n 2 7 V ' 

C2 = l2Cm2 

Subsequent terms may be generated by iteration of the RODE, and in this case, 
after some algebra, we find that 



(2i + 3)(2t + 2)c l+1 = - (2i + l)cj 

i i—l 
+ a lm{n)Ci-t 

— mC-l C71 

1=0 r> 

0(i) 



Z=0 m=0 

i-1 i-l-i 



(27) 



X] X! ( 2m + 1 ) C »-l-«-mCjC T , 



n 

i=0 m=0 



where 6»(0) = 0, 0(i) = 1 if i > 1, and 

a im (n) = (1 + i)(2Z + l)(2m + 1) - ^m(2m + 1) (28) 

3.3 The tail expansion 

We now develop a series solution about the right tail Q — > 00. We begin by 
assuming that Q 2 3> n. The Recycling ODE becomes 

Q{Q" + vQ') = (n+l)(Q') 2 (29) 

Following some experimentation, we make the change of variables 

and this reduces the ODE to 

P"(v) + vP'(v) = (31) 
The solution of this satisfying the condition that P(v) — » as v — > 00 is 

P(v) cc erfc(-^) (32) 



W.T. Shaw <fe N. Brickman: Monte-Carlo recycling <fe GPU quantilcs 



7 



and wc deduce that for some constant c, 



Q{v) ~ d 



2 erfC( ^ ) 



-i -l/n 



We see that the solution has emerged naturally as 



Q(v) 



l-<f>(v) 



-1/r, 



(33) 



(34) 



where <& is the Gaussian CDF. The asymptotic differential equation is scale 
invariant so we have to determine d by other means. It is possible that it might 
be possible to determine it by a matching argument, but it is simpler to now 
appeal to other known properties of the Student distribution. In [13] the tail 
behaviour of the Student CDF was determined (see Eqns. (60-62) of [13]) and 
we can deduce that 



d 



- r O/ 2 ) 
r((n+l)/2) 



-l/n 



(35) 



If we step back from these calculations it becomes clear what is happening. The 
Recycling ODE is starting to reconstruct a solution that combines the change of 
variable w = 1 — $(w) with the asymptotic power series of the ordinary Student 
quantilc. 



3.4 Comparison with traditional asymptotics 

Expansions of Cornish-Fisher type can be found in the statistics literature. One 
that is reasonably well known is the expansion of the Student random variable 
t in terms of the Gaussian random variable z, for larger values of the degrees of 
freedom n. It is quoted, for example, as identity 26.7.5 of [3]. 



t =z + 



z 3 + z 



5z 5 + I6z 3 + 3z 3z 7 + 19z 5 + 17z 3 - I5z 



+ 



4n 

79z 9 + 776z 7 - 



96n 2 
1482z 5 



384n 3 



1920z 3 - 945z 



(36) 



92160n 4 



+ 



An equation of true Cornish-Fisher type (cf identity 26.2.49 of [3]) can be ob- 
tained by transforming (provided n > 2) to a variable s with unit variance: 
s = tyjl — 2/n and re-expanding in inverse powers of n. That Eqn. (36) is 
somehow incomplete is evident by the fact that z appears in every term, z 3 in 
all but the first, and so on. The matter is resolved nicely by first observing that 



7 = 1- 



1 

4n 



1 

32 







a) 




/ 128 






which sums up all the z-terms. Similarly 



1 1 

4n 6 



+ 



17 

384 



1 

48 




, (37) 



(38) 



and so on. So the series solution of the differential equation constitutes a re- 
summation of the known asymptotic series where the coefficient of each power 
of z is computed exactly. 
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3.5 Accuracy and numerical methods 

We now turn to the quality of the results. This can be assessed precisely by the 
use of an exact representation of the composite function F^ 1 (^(z)), where $ is 
the normal CDF and F n the Student CDF. The exact formula for the Student 
CDF for all real n is given in terms of inverse beta functions by Shaw [13], and 
there are known simpler forms for n — 1,2,4. These arc also given in [13] and 
are also now available on the Wikipedia page on quantilc functions [12]. The 
case n = 4 is an interesting case as it is known exactly, is the boundary case 
where kurtosis is infinite, and there is some evidence from work by Fergusson 
and Platen [7] that it is a good case for modelling daily world index log-returns. 
We shall therefore develop this in some detail. It turns out that working as far 
as cio is a useful point. A detailed calculation shows that the precision (i.e. 
relative error) of the central power series is then less than 2 x 10~ 5 on \z\ < 4. 
For this case we find that 

7 = \ \j\ ~ 1 -06384608107048714 (39) 

and the full C-code form for the central series is, with y — z * z, 

t = z* (1.06384608107048714 + 

y* (0.0735313753642658509 + 
y* (0.00408737916150927847 + 

y* (0.000157376276663230562 + 
y*(4.31939824140363509e-6 + 
y*(9.56881464639227278e-8 + 
y*(2.09256881803614446e-9 + 
y*(3.87962938209093352e-ll + 
y*(2.72326084541915671e-13 + 
(2.90528930162373328e-15 + 
4 . 59490133995901375e-16*y) *y) ) ) ) ) ) ) 

)) 



To treat the tail regions \z\ > 4 with corresponding accuracy when n = 4 it is 
sufficient to use just two terms of the known tail series. This gives us, in general, 
for the positive tail (the negative tail being treated by symmetry) 

-™^F(^ (4Q) 

and for the case n = 4: 

% (41) 
t = 2w-^(l - ^w 1 ? 2 ) 

The optimal crossover is then in fact at z = 3.93473 with maximum relative 
error less than 1.4 x 10~ 5 over the entire range of z 
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3.6 Purely numerical methods 

The analysis for the Student t case, although rather specialized, also allows 
the appraisal of direct numerical schemes. The direct numerical solution of the 
RODE can be done using standard methods. Within Mathematica version 6, the 
use of NDSolve with high precision and accuracy goals, explicit Rungc-Kutta 
and sixth-order differences leads to an precision of better than 5 x 1CP 8 on the 
range \z\ < 6, which is excellent. Of course, one must also consider sampling 
efficiency issues arising from such interpolated numerical schemes, but they can 
be made the basis of a further, e.g. rational approximation if speed is an issue. 
Such a numerical scheme will be exploited in the examples considered below. 

4 Hyperbolic and Variance Gamma 

In this section we move to other distributions of interest to finance. First we 
consider the hyperbolic distribution, and then the variance gamma. These will 
have in common a non-normal base distribution, and will illustrate the use of a 
2-sided exponential base instead. 

4.1 Hyperbolic quantile from exponential base 

This was originally motivated by Bagnold's classic study of sand [4], and was 
given a clear mathematical description by Barndorff- Nielsen [5], who also gen- 
eralized it. The applications to finance have been explored Eberlein and Keller 
[6]. A direct treatment of the quantile function for the symmetric case has 
been given by Xiong [19]. He we shall explore the conversion of samples from 
a suitable exponential distribution to samples from the hyperbolic. Hyperbolic 
distributions can of course be sampled as random mixtures of a normal distri- 
bution. Our method facilitates the use of hyperbolic marginals coupled to an 
arbitrary copula, and and this example also illustrates how cleanly the choice 
of a suitable base simplifies the computations of the quantile - the exponential 
base regularizes the tail in an elegant way. 

The probability density function is known explicitly as 

f(x, a, (i, S, ft) = 2a5 ^ 5i) exp{-a + (x - ^ + [3(x - M )} (42) 

where 7 = i/a 2 — /3 2 , with < a > 0. In what follows we shall translate the 
origin so that /j, = 0, with density 

f(x, a, p, S) = 2a& ^ & ) expj-av^ 2 + + fix} (43) 

The ff-function for the target distribution is given by the negative of the loga- 
rithmic derivative: 

^x) = -^f{x,a,P,S) = w =^-0 (44) 

and it is evident that for large x, 



H(x) ~ sign(x)a — = ±a — (3 . (45) 
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Bearing mind that the exponential distribution is characterized by a constant 
iJ-function, we will use a pair of exponential distributions for the base case. 
In order to get the proportion of the random variables that are positive and 
negative correct, we let 



P+ = l dX 2a5K 1 (8 1 ) eM-ccVt^ + Px} 
so clearly p + + p_ = 1. 



(46) 



r^(a-/3)e-(-«- ifx>0, 
MJ U_(a + /3)e( Q +^ itKO, 1 ' 



The quantile function for sampling from /o has the trivial form: 

f^glog(w/p-) ifu<p_, 
{^^Sd 1 - U )/P+) itu>p-, 



(48) 



So samples from the base can be made easily. To convert them into samples 
from the hyperbolic we solve a left and right differential equation. The right 
problem is of the form 

on v > with the initial condition Q(0) = and 

= q^T) = -JW = p+{a ® 2 T Kl{Sl)e (50) 

The left problem is 

dv 2 dv V v/<5 2 + Q 2 / V rfu / 

on w < with the initial condition Q(0) = and 

The solution to this differential system is readily visualized as a kind of 'QQ' 
plot. If we use a sixth-order explicit RK method as before, with parameters 
a = 1 = 6, (3 = for illustration, the result is show below, together with the 
identity map (diagonal line). 

4.2 VG quantile from exponential base 

The variance-gamma density was introduced by Madan and Seneta [10] as a 
model for share market returns. The density is given, forA>0,a>0,|/3|<o:, 

by 



e^M A -5 (a 2 -(3 2 ) K x _i{a\x 
(2a) A -V2^r(A) 2 



(53) 
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Figure 1: QQ Plot for conversion of exponential to hyperbolic 



In the region x > the ii-function is given by 
H(x)- 



d i rt\ aK *-3/2(ax) l-A 

^-log(,f) = — ^ — r -/3~ (a-|8 + + 

ax K\_i/2(oiX) x 



(54) 



In the region x < the _ff -function is given by 

ui \ d i rt\ aK X - 3 /2(-ax) 1 - A / / 1 

= - — log(/) = — ^ r -^~ -{a + p)+ + 

ax &\-i/2\~ ax ) x \ \x 

(55) 

These asymptotic relationships suggest that the VG model may be treated in a 
similar way to the hyperbolic case, as the asymptotics are closely related with 
a good match to the exponential base. This time the probabilities p± are given 
by 



(a 2 -/? 2 )' 



o2A-l / 



dxe !ix x x ^K x _i{ax) 

a+0 



(2a)^V2^r(A) J 

r(A+i) 2 F 1 (2A,A;A+l;f±£ 



(a 2 -/? 2 )'' 



(2a)^-V2^r(A) J 



7rr(A+ 1) 

dxe- f3x x x -^K x _i{ax) 



(56) 



y^r(A + i) 

where we have used identity 6.621.3 from [3] to evaluate the integrals giving the 
probabilities that the VG random variables is positive or negative. It is easily 
checked that if j3 = then p + = P- = 1/2. 
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The difference between VG and hyperbolic is that in the case of VG the 
details of what has to be done are sensitive to the value of A. First, we note 
that if A = 1 the VG model is trivial as it is identical to the base, so that 
Q(v) = v. If A > 1 matters remain reasonably straightforward, as both / and 
H exist at v = 0, with H(0) = 0. The recycling ODE may be solved as before, 
though many steps may be needed near v = if A remains close to and just above 
1. When < A < 1 matters are more complicated, as then H(0) is divergent, 
and furthermore the density becomes singular in the range < A < 1/2. The 
density has a log divergence when A = 1/2, and otherwise diverges as x 2A_1 . 
All of these issues may in principle be addressed by doing analytical estimates 
in a small neighbourhood of the origin and starting the numerical treatment 
at a small distance from the origin - as noted several different cases must be 
considered and full details will be given elsewhere. 

5 Normal samples from exponential 

The construction of the normal quantile, also known as "probit" has a long 
and interesting history - see [16] and the references therein for details. Here 
we consider the construction of normal samples from exponential samples, and 
proceed to a detailed practical implementation. We work on the right hand 
region and extend the mapping to the left region by odd symmetry. The recyling 
ordinary differential equation in the right hand region, v > is simply 

d 2 Q | dQ =Q (dQ\ 2 
dv 2 dv \dv J 

with the initial conditions Q(0) = 0, Q'(0) = y/n/2. This has the formal 
solution 

Q(v) = $~ 1 (1- l/2e~ v ) (58) 

Where $ is the normal CDF. To extract useful representations we proceed as 
follows. This equation may first be solved by the method of series. However, 
the resulting solution turns out to be an asymptotic series best used to a small 
number of terms in a neighbourhood of the v = 0. The series solution is easily 



(57) 
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found to be, using exact coefficients: 

1 pK 2 (2^+7r 3 / 2 ) W 3 (^+3^ 3 / 2 ) 



Q(v) 



v - -\ -v 



+ 



2 2 V 2 Uy/2 24^2 

(4^ + 50tt 3 / 2 + 7tt 5 /2) w 5 + 180 ^3/2 + 1057r 5/2) w e 

480\/2 2880^2 
+ 1204tt 3 / 2 + 1960tt 5 / 2 + 127tt 7 / 2 ) v 7 
40320^ 

(2^/tt + 966^ 3 / 2 + 3675^ 5 / 2 + 889tt 7 / 2 ) 
80640^2 

(16^ + 24200tt 3 / 2 + 194628tt 5 / 2 + 117348tt 7 / 2 + 4369tt 9 / 2 ) v 9 

5806080\/2 

(16^+ 74640tt 3 / 2 + 1190700tt 5 / 2 + 1493520tt 7 / 2 + 196605tt 9 / 2 ) v w 

58060800^ 

+ 0(« n ) 

(59) 

While this expression is interesting, it does not work far enough out to be of 
much practical use, so a different approach is needed - if we wish to retain the 
use of the above expression we would need to patch in another algorithm. One 
could consider solving the differential equations about several points. However, 
an important point for modern computation is to try to avoid "IF" statements 
in the computer implementation. Such branches do not make use of the best 
features of modern GPU systems, such as the NVIDIA Tcsla system [17]. The 
standard rational approximations all have breaks as follows in the positive quan- 
tilc region Z > 0, 0.5 < u < 1: 

• Wichura's AS241 [18]: two breaks, at u = 0.925 and u = 1 - e~ 25 . This 
includes a ^/log() transformation in the tail. 

• Moro [11]: breaks at u = 0.92. This includes a loglog() transformation in 
the tail. 

• Acklam Level 1[1]: breaks at u = 0.97575. This includes a ^/log() trans- 
formation in the tail. 

Wichura's model is double precision, and the iterated Acklam (Level 2) model 
is described as "machine precision". The non-iterated Level 1 Acklam model 
is popular in financial applications and has maximum relative error less than 
1.15 x 10~ 9 . We shall use this as a target for fast single-precision computation. 

It is desirable to avoid branches with an expensive computation in the tail. 
This is because in GPU architectures it is possible that the performance of many 
threads within a single 'warp' is determined by the slowest branch. Given the 
location of the branches it is probable that any given warp contains a thread in 
the tail arising from random sampling in the unit interval, thereby reducing the 
entire warp to the speed of the tail. 

How do we we avoid the break, at least for most practical computations? The 
first thing to point out is that the "break" at u = 1/2 is fictitious in practical 
applications. It is more sensible to work on a half region, e.g. 0.5 < u < 1, 
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an output both Z = and —Z for simulation purposes, i.e. always work 

antithetically. So we focus on the real breaks as in the list above. This break 
arises in standard approaches due to the fact that the standard quantilc <I> _1 (it) 
has rather a split personality - it is slowly varying in the central region where u 
is between a half and about 0.9, and then diverges to infinity as u — > 1_. This is 
shown in Fig. 2. If we work in an exponential base the situation changes. The 




0.6 0.7 0.8 0.9 1.0 



Figure 2: The normal quantile in standard coordinates 

function Q(v) = <I> _1 (1 — l/2e~ v ) is shown in Fig. 3 for the region < v < 37. 
This function now has a much simpler quality and we can aim to build a single 
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Figure 3: The normal quantile in exponential coordinates 

useful rational approximation. It is then a matter of picking a target range 
and precision for the desired result. In Fig. 3 we have plotted the function 
in the range < v < 37, which is equivalent to the u-range [0.5, 1 — e~ 37 ] = 
[0.5, 1-5.55 x 10" 17 ], and the Z-range < Z < 8.3236. So we would not expect 
to visit the region outside this for sample sizes less than about 10 16 . Crudely, 
we are safe for samples of no bigger than a million billion. We we shall work on 
v £ [0, 37]. For precision we shall use the Acklam level one algorithm as a target. 
It is then a matter of taking a rational approximation of sufficient degree. This 
was explored using the high-precision arithmetic of Mathcmatica to work out 
the normal quantile deep into the tail, and the function MiniMaxApproximation 
to create the rational approximation. The function actually approximated was 



(60) 
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and the power series for Q was used in a small neighbourhood of the origin 
to allow MiniMaxApproximation to work preserving precision near the origin, 
where Q(0) = 0. The settings employed for the computation were 

• Brake -> 10, 10; 

• WorkingPrecision -> 20; 

• Maxlterations -> 300; 

and a rational approximation of degree (7, 7) was found with the desired ac- 
curacy. The relative error is show in Fig. 4 and is less than 1.06 x 10~ 9 on 
< v < 37. 



9 

!) 










11 
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Figure 4: Precision of exponential-normal quantilc on [0,37]. 

The resulting form for Q(v) is as follows 

Q(v)=v*P{v)/Q(v) (61) 

where P and Q are polynomials of degree 7, with nested C-forms as follows, 
where we produce the higher-precision output generated by Mathcmatica. The 
numerator P is 

1.2533141359896652729 + 

v* (3. 0333178251950406994 + 
v* (2. 3884158540184385711 + 

v* (0.73176759583280610539 + 

v* (0.085838533424158257377 + 

v* (0.0034424140686962222423 + 
(0.000036313870818023761224 + 
4 . 3304513840364031401e-8*v) *v) ) ) ) 

) 

and the denominator Q is 

1 + v* (2. 9202373175993672857 + 

v* (2. 9373357991677046357 + 
v*(l. 2356513216582148689 + 
v* (0.2168237095066675527 + 

v* (0.014494272424798068406 + 
(0.00030617264753008793976 + 
1 . 3141263119543315917e-6*v) *v) ) ) ) 

) 
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For completeness, an algorithm for normal samples based on this under standard 
conditions is (in the first two steps we give in brackets the better form using 
a reflection and scaling to simplify the first part and avoid precision reduction 
near unity): 

• sample u in 1/2 < u < l(or, better, < u < 1); 

• evaluate v = — log[2(l — «)], (then, better, v = — log[u]); 

• evaluate Z = Q(v) with Q given by the rational approximation; 

• output the antithetic pair Z,—Z. 

If an exponential base is used we are essentially employing the last two steps. 

How reasonable is it to claim that this algorithm is "essentially IF-less"? 
One test is to ask what would happen if we introduce a very small u into the 
algorithm above - what is then the margin of error if it is generates a value of 
v > 37? The precision does then deteriorate to levels above the Acklam target, 
but very slowly. Below v = 50, corresponding to u differing from an end-point 
by about 1CP 22 , the precision remains at better than 1CP 6 . If we double the 
u-range to 74, where u is 0(1CP 33 ), the precision is still better than 2 x 1CP 5 . 
So we can safely use the breakless algorithm on the basis that if a fluke sample 
falls outside its very wide formally-defined range the answer returned remains 
very good. For example, with v = 74 the exact result is Z — 11.94047 and 
the rational approximation yields Z — 11.94084. We get this stability due to 
the nice behaviour of the exponentially-transformed quantile, and this is then a 
safe algorithm for use with single-precision arithmetic, which is the particular 
strength of a GPU. 

Of course, another very simple approach to preserving precision and avoiding 
an "IF" in the code is to sample the tail interval completely separately and 
apply a transformed quantile to that region by itself. We now turn to what that 
construction should be. 



5.1 A supplementary tail model 

If one does wish to penetrate the deep tail with precision preservation, the 
asymptotic analysis developed in the Appendix to [16] may be used - indeed, the 
exponential base is well adapted to the Gaussian tail. Converting coordinates, 
and introducing just one further group of terms into the series, we find that 

Q(v) = ^2q(a,b) (62) 

where 

a = log(w - 1/2 log(7r)) , b = log(a) (63) 

and 

, , N b \-\ 6 2 -66+14 2b 3 - 216 2 + 1026 -214 

q(a, b ) ~ a - - + i 2 - + — + — 

2 a lba z 96a d /g^\ 

36 4 - 46k 3 + 3486 2 - 14886 + 2978 ^ , 5n 
+ 384^ + ^ " 5 ) 

This again has precision better than 1.06 x 10~ 9 , now in the region v > 37, and 
indeed becomes more precise as v — > +oo, as shown in Fig. 5. 
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Figure 5: Precision of supplementary tail model in v > 37. 



5.2 Real- world precision in CH — \- 

The following results are indicative of what happens in practice. The quantile 
was tested in the Bloodshed DEVC++ environment under Windows XP, using 
the listing in the Appendix. The output was benchmarked with all variables 
double against the internal high-precision quantile in Mathematica, and found 
to preserve the 0(1CP 9 ) precision. The plot of the precision of the C++ output 
is shown in Fig. 6. 



Figure 6: Precision of (7, 7) "breakless" C++ model in double precision. 

We do not make any claim that this algorithm is universally better than 
any other, regardless of whether one is working on a CPU or GPU. Rather, the 
point is that we can, by a change of variable, extend the interval over which we 
can cover the quantile accurately by a very large margin. The relative benefits 
of avoiding any IF-statement need to be assessed on a variety of computer ar- 
chitectures and compilers, and variations to the method above may be needed. 
It is certainly straightforward to generate other single-patch rational approx- 
imations with different properties. Each time we increase the degree of the 
numerator and denominator, keeping the interval fixed, the maximum relative 
error decreases by a factor of about 20. For example, a (12, 12) rational approx- 
imation exists that covers the same interval < v < 37 with maximum relative 
error in Mathematica less than 5 x 10 -16 , and in C++ with a meaningful "long 
double" the error remains below 7 x 1CU 16 . Alternatively a quite modest in- 
crease in computation allows the interval to be extended significantly. An (8, 8) 
approximation exists with precision about 6 x 1CP 10 on the range < v < 74, 
corresponding, with reflection to u E [e, 1 — e] with e = 3.6 x 10 -33 . 
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The Mathematica notebook used to generate such schemes can be obtained 
on request from WS. 

5.3 Benchmark results: CPU vs GPU single precision 

We now turn to a more careful analysis of performance against the popular 
Acklam Level 1 method. The function given in Appendix A was re-written and 
the relevant function listed in Appendix B with some natural source-level op- 
timizations. We call this model ICNDfloatl. Bearing in mind that in float 
(single-precision) mode typical of earlier GPUs, the precision of 0(1CP 9 ) is 
redundant, and we proposed for general single-precision use the listing in Ap- 
pendix C. We call this ICNDf loat2. This is the algorithm we propose for optimal 
GPU normal simulation based on quantiles. If implemented in double precision 
the maximum relative error is less than 4 x 1CP 7 . In practice in float form it 
gave results slightly better than the float form of the Acklam result, particularly 
near the branch point. 

First, consider why any improvement at all might be expected. On a GPU 
it is typically the case that a number of threads are executed at the same time. 
However, the GPU architecture is such that the timing of such a multi-threaded 
computation is influenced by the slowest outcome of any of the branches that 
are executed. In the Acklam model there is a fast rational approximation with 
no special function calls in the central region. In the tail there is the operation 
of taking a log followed by a square root. In the Moro model a log(log()) 
operation is carried out in the tail region. AS241 also uses composite special 
function calls. On a CPU the timing of the algorithm benefits from the fast 
central algorithm and the tail algorithm slows the routine down only on (for 
the Acklam case) 4.85% of calls. This is highly efficient on a CPU architecture 
that processes each calculation separately. A simple timing was done using 
the Bloodshed DEV C++ compiler on an Intel 2.8GHz machine. In each case 
the simple internal rnd function, normalized to return values of U in the unit 
interval, was run a billion times without the normal quantile call and then with 
the normal quantilcs we are considering 2 . The timings for calling the quantile 
obtained by subtracting the two results on the CPU were as follows (results in 
seconds): 

• Acklam Level 1; CPU: 59s 

• ICNDfloatl; CPU: 89s: 

• ICNDfloat2; CPU: 82s 

In each case the overhead of calling rnd was about 15s. These results demon- 
strate clearly the efficiency of the Acklam approach in a traditional architecture. 

For a proper GPU analysis the code was ported initially to an 8400GS GPU 
and re-run in the same way. For a fair comparison the Acklam algorithm was 
optimized. Timings for the quantile call were as follows 

The benefit of working in branchless form is now clear. The improvement, 
though modest, can make a difference, especially if one is solving an SDE via 
many calls to a normal random variable prior to evaluating a payoff. 

2 The rnd() function is of course completely unsuitable for real-world use, but given that 
we only need a method for sampling the various regions and subtract the overhead, its use 
here is fine. 
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Algorithm 


Timing t[s] 


Acklam 


5.04 


ICNDfloatl 


4.89 


ICNDfloat2 


4.64 



Tabic 1: Single precision timings for normal quantilc on GPU 

In the full philosophy of this paper, one would of course use an exponential 
base for many different computations and distributions and possibly pre-store 
a large set of exponential samples created by efficient methods. The overhead 
of converting to normal is then the evaluation of a simple rational function and 
the performance benefits are magnified many times over those given in Table 1 . 

5.4 High precision work 

One can also consider working to double precision on a modern TESLA GPU. 
The first matter to establish is the quality of standard methods. There are two 
well-known candidates. These are 

1. Wichura's AS241; 

2. The refined Acklam method, where the level one approximation is fed once 
through a Newton-Raphson-Halley method. 

How do we to a quality check on such high precision methods? We will use the 
Mathematica function InverseErf as our benchmark. However, this will not be 
done blindly on the assumption that it is necessarily correct. The quantile based 
on this has been independently assessed against the known exact solution for 
the Gaussian quantile developed by Steinbrecher and Shaw [16] that is known in 
series form. The formula for this in a computation-suitable representation is also 
available at http:/ /en. wikipedia.org/wiki/Probit and as a series has been coded 
up both in Mathematica and quadruple-precision FORTRAN based on the Ab- 
soft compiler. Based on these three implementations various cross-verifications 
have been carried out. For example, the quad-precision FORTRAN code that 
agrees with Mathematica 's internal InverseErf to a precision of better than 
1CP 29 on the interval [e, 1 — e], with e = 0.0007. Near the centre of the unit 
interval the truncated series written in Mathematica agrees with InverseErf to 
much better than quad precision. So we have considerable confidence that our 
benchmark is precise enough for any double-precision evaluation. 

A precision test of AS241 was carried out in previous work [14] and the 
relative precision of a Mathematica representation of AS241 is shown in Fig. 7 
This confirms the double-precision quality of the algorithm. We obtained less 
satisfactory results with the refined Acklam scheme. While the relative error is 
typically of order 10~ 15 away from the middle or tail, there is a substantial loss 
of precision in the middle and the tail. The Newton-Raphson-Halley refinement 
was done first exclusively in Mathematica. The precision near the middle is 
shown in Fig. 8, and the tail is shown in Fig. 9. Similar loss of precision was 
found in the implementation by J. Lea in C/C++ of the refined method, using 
the Cody formula for the CDF. Based on these observations we dispute the claim 
that this algorithm is machine precision. Indeed, it appears that in the tail the 
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Figure 7: Precision of AS241 
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Figure 8: Precision of refined Acklam - centre region 

refined algorithm may be less precise than the level one formula. The suspicion 
is that the subtraction of two nearby numbers in the Newton-Raphson-Hallcy 
scheme is causing a problem that is then amplified by the huge inverse density 
in the tails. 

We now turn attention to real- world precision and performance in C/C++ 
using a double type specification in AS241 and our own proposal. The C++ 
implementation for AS241 is that supplied by John Burkardt at 

http: //people . sc . f su. edu/~burkardt/cpp_src/asa241/asa241 .html 

For completeness we also considered the coding of the refined Acklam algorithm 
supplied by Jeremy Lea at 

http: //home . online .no/~pjacklam/notes/ invnorm/impl/lea/lea. c 

Our own suggestion for double precision work is listed in Appendix D. The 
theoretical precision of this algorithm when evaluated in arbitrary precision in 
Mathematica is O(10~ 15 ) on the interval [e, 1 — e], with e < 10~ 32 , and so is 
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Figure 9: Precision of refined Acklam - left tail region 

good for any practical size Monte Carlo 3 . In practice the precision in a double 
implementation in C/C++ is similar to AS241. We have not yet evaluated the 
performance on a TESLA-class GPU in double precision, but even the CPU 
timings are revealing and are as follows, for half a billion samples on [0, 0.5]. 



Algorithm 


Timing t[s] 


Acklam (refined)-Lea 


179 


AS241-Burhardt 


104 


GPU DP model 


82 



Table 2: Double precision timings for normal quantile algorithms on CPU 

Due to the elimination of branches and the avoidance of calls to a sqrt (log() ) 
operation we expect the GPU advantage to be better still. The refined Acklam 
method is slow probably due to expensive calls to the error function for all ar- 
guments - the GPU method is now more than twice as fast even when evaluated 
on a CPU, notwithstanding our precision issues. AS241 stands up well as a high 
precision benchmark but it is now possible to proceed faster. We re-iterate that 
the single precision form of the Acklam method remains optimal for float-class 
calculations on a CPU, but is also outrun on a GPU by an optimized algorithm. 
Further optimizations may of course be possible - the codes presented here in 
Appendices C and D are our current optimal forms and may be subject to fur- 
ther improvement as regards speed and precision. We will also explore OpenCL 
implementations. 



3 The value of e is now so small that we could in fact add a break and a tail model without 
compromising GPU efficiency, as the probability of probing the tails is now so small. 
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6 Conclusions 

In the post-credit-crunch environment, risk simulations depend critically on hav- 
ing a realistic (fat-tailed) model of asset returns. The methods developed here 
allow traditional Gaussian samples to be converted to other distributions via the 
application of the solution differential equation to the samples. The differential 
equation is the recycling ODE for transforming samples from a density /i to a 
density /2, and is 



We have given an explicit example for the Student t case, where a power series 
emerges coupled to a tail model. Other more complicated distributions with 
an explicit density may be handled similarly or numerically, and other base 
distributions may be treated. In particular we can use changes of variable to 
construct "essentially IF-less" algorithms for objects like the normal quantilc. 
The efficiency of such algorithms in GPU computation is of interest, and the 
methods introduced here can be considered for other target distributions. In 
contrast to the normal case, where there are no parameters beyond the transla- 
tion and scale, we must first solve the RODE with the relevant parameters and 
then develop a suitable fast approximation. 

These methods also simplify the use of a Gaussian or T-copula, since the 
two steps of mapping to the unit hypercube and back to the marginals may be 
folded together into one operation, where the solution to the RODE is applied 
directly in one step. 

Of course, the methods developed here rely on the ability to compute the log- 
arithmic derivatives of the target and base densities. Where the target density 
is not known explicitly, but whose characteristic function is known, other meth- 
ods must be used. Investigations of the resulting integro-differential equations 
will be reported elsewhere. 

We have reported a new formula for the normal quantile and demonstrated 
a modest performance benefit on a GPU architecture by working in a branchless 
form for single precision work. Initial CPU tests on Double precision variations 
suggest more significant performance enhancements. 
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where 



Hi = -^-log[fi{x)] 
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Appendix A: C-\ — |- listing for precision testing 

This is the C++ listing for the test program for the "breakless" positive normal 
quantile used to generate the output in Fig. 6, when compared with the internal 
high-precision quantile in Mathcmatica. 

//breaklessquantile . epp 
#include <cmath> 
#include <iostream> 
#include <fstream> 
using namespace std; 

double BreaklessQuantile (double u) 
{ 

double v=-log(2*(l-u)) ; 
double P = 1.2533141359896652729 + 
v* (3. 0333178251950406994 + 
v* (2. 3884158540184385711 + 

v* (0.73176759583280610539 + 

v* (0.085838533424158257377 + 

v* (0.0034424140686962222423 + 
(0.000036313870818023761224 + 
4 . 3304513840364031401e-8*v) *v) ) ) ) ) ; 

double Q=l+v* (2. 9202373175993672857 + 
v* (2. 9373357991677046357 + 
v* (1.2356513216582148689 + 
v* (0.2168237095066675527 + 

v* (0.014494272424798068406 + 
(0.00030617264753008793976 + 
1 . 3141263119543315917e-6*v) *v) ) ) ) ) ; 

return v*P/Q; 

>; 
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// The function is all above - that below is the simple test program. 

int mainO 
{ 

double q; 
double quant ile; 

char name [5] ; 
int k,m; 

cout << "Dutputting test values of breakless quantiles " « "\n"; 
of stream out ("breaklessquantiles .txt") ; 
for (k=5000; k<=9999; k++) 
{q = k/10000. ; 

quantile = BreaklessQuantile(q) ; 
out .precision(12) ; 

out « q «","« quantile « "\n";> 

cout << "Output written to breaklessquantiles.txt \n" ; 
cout << "Hit any key to quit \n" ; 
cin » name; 
return (0) ; 

} 



Appendix B: ICNDfloatl listing 

Here is full quantile form of the function in Appendix A in a form suitable for 
GPU work under CUDA. 

#include <cmath> 
using namespace std; 

#def ine BQP (v) (Pl+v* (P2+v* (P3+v* (P4+v* (P5+v* (P6+ (P7+P8*v) *v) ) ) ) ) ) 
#def ine BQQ (v) (Ql+v* (02+v* (Q3+v* (04+v* (Q5+v* (06+ (Q7+Q8*v) *v) ) ) ) ) ) 
float ICNDfloatl (float v) 
{ 



const 


float 


PI 




1 


2533141359896652729 




const 


float 


P2 




3 


0333178251950406994 




const 


float 


P3 




2 


3884158540184385711 




const 


float 


P4 







73176759583280610539 ; 


const 


float 


P5 







085838533424158257377 ; 


const 


float 


P6 







0034424140686962222423 ; 


const 


float 


P7 







000036313870818023761224 ; 


const 


float 


P8 




4 


3304513840364031401e-8; 


const 


float 


Ql 




1 


0; 




const 


float 


Q2 




2 


9202373175993672857 




const 


float 


Q3 




2 


9373357991677046357 




const 


float 


Q4 




1 


2356513216582148689 




const 


float 


Q5 







2168237095066675527 




const 


float 


Q6 







014494272424798068406 ; 
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const float Q7 = 0.00030617264753008793976; 
const float Q8 = 1 . 3141263119543315917e-6 ; 
float z; 
int sgn ; 

sgn = (v >= 0.5); 
sgn = sgn - !sgn; 

z = -logf(1.0 - (sgn * ((2.0 * v) - 1.0))); 
return sgn * z * BQP(z) / BQQ(z) ; 



Appendix C: ICNDfloat2 listing 

Here is full quantile form of the optimized single precision algorithm in a form 
suitable for GPU work under CUDA. 

#include <cmath> 
using namespace std; 

#define CQP(v) (Pl+v* (P2+v* (P3+v* (P4+(P5+P6*v) *v) ) ) ) 
#define CQQ(v) (Ql+v* (Q2+v* (Q3+v* (Q4+(Q5+Q6*v) *v) ) ) ) 
float ICNDfloat2 (float v) 



const 


float 


pi 




1. 


. 2533136835212087879 ; 


const 


float 


P2 




1. 


, 9797154223229267471 ; 


const 


float 


P3 




0, 


. 80002295072483916762 ; 


const 


float 


P4 




0. 


, 087403248265958578062 ; 


const 


float 


P5 




0, 


. 0020751409553756572917 ; 


const 


float 


P6 




4 


, 744820732427972462e-6 ; 


const 


float 


Ql 




1, 


.0; 


const 


float 


Q2 




2. 


,0795584360534589311; 


const 


float 


Q3 




1. 


,2499328117341603014; 


const 


float 


Q4 




0. 


, 23668431621373705623 ; 


const 


float 


Q5 




0. 


.0120098270559197768; 


const 


float 


Q6 




0. 


. 00010590620919921025259 ; 


float 


z; 











int sgn ; 

sgn = (v >= 0.5); 
sgn = sgn - !sgn; 

z = -logf(1.0 - (sgn * ((2.0 * v) - 1.0))); 
return sgn * z * CQP(z) / CQQ(z); 



Appendix D: Double branchless quantile 

Here is optimized double precision branchless algorithm. 

#include <cmath> 
using namespace std; 
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#def ine EQP (v) (Pl+v* (P2+v* (P3+v* (P4+v* (P5+v* (P6+v* (P7+v* (P8+v* (P9+v* 
(PlO+v* (Pll+v* (P12+V* (P13+P14*v) )))))))))))) 

#def ine EQQ (v) (Ql+v* (Q2+v* (Q3+v* (Q4+v* (Q5+v* (Q6+v* (Q7+v* (Q8+v* (Q9+v* 
(QlO+v* (Qll+v* (Q12+V* (Q13+Q14*v) )))))))))))) 

double EDPBreaklessInvCNDgpu (double v) 

{ const double PI = 1.2533141373154989811; 

const double P2 = 5.5870183514814983104; 

const double P3 = 9.9373788223105148469; 

const double P4 = 9.11745910783758368; 

const double P5 = 4.6865666928347513004; 

const double P6 = 1.3841649695441184484; 

const double P7 = 0.23434950424605615377; 

const double P8 = 0.022306824510199724768; 

const double P9 = 0.0011538603964070818722; 

const double P10 = 0.000030796620691411567563; 

const double Pll = 3 . 9115723028719510263e-7 ; 

const double P12 = 2 . 0589573468131996933e-9 ; 

const double P13 = 3 . 3944224725087481454e-12 ; 

const double P14 = 7 . 3936480912071325978e-16 ; 

const double Ql = 1.00000000000000000000; 

const double Q2 = 4.9577956835689939051; 

const double Q3 = 9.9793129245112074476; 

const double Q4 = 10.574454910639356539; 

const double Q5 = 6.4247521669505779535; 

const double Q6 = 2.3008904864351121026; 

const double Q7 = 0.48545999687461771635; 

const double Q8 = 0.059283082737079006352; 

const double Q9 = 0.0040618506206078995821; 

const double Q10 = 0.00014919732843986856251; 

const double Qll = 2 . 7477061392049947066e-6 ; 

const double Q12 = 2 . 2815008011613816939e-8 ; 

const double Q13 = 7 . 0445790305953963457e-ll ; 

const double Q14 = 5. 1535907808963289678e-14; 

double z; 

int sgn; 

sgn = (v >= 0.5) ; 
sgn = sgn - !sgn; 

z = -log(1.0 - (sgn * ((2.0 * v) - 1.0))); 

return sgn*z*EQP(z)/EQQ(z) ; 

} 



